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manipulating co flections of items representing external 
information objects; a story editor (40) for providing a 
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graphical representation of a story; wherein an update 
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manager onto a textual story provided in the story editor 
(40) or onto the graphical story provided in the diagram 
editor (50). The overlaid items are compared with the 
information in the textual story or the graphical story, and 
the result of the comparison is indicated in the story ed- 
itor or in the diagram editor 
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Description 

CROSS-REFERENCE 

[0001] This application is a continuation-in-part appfi- 
calion of Application Serial No, 09/853,115, filed May 
22> 2001 , and tilted "Software System for Biological Sto- 
rytelling", which is incorporated herein by reference in 
its entirety and to which application we claim priority un- 
der 35 USC §120. 

FJELD OF THE INVENTION 

[0002] The present invention pertains to software sys- 
tems supporting the information synthesis activities of 
organizing, using, and sharing diverse, complex infor- 
mation. 

BACKGROUND OF THE INVElSfTION 

[0003] As in many fields, research in molecular biolo- 
gy moves through an initial phase involving the formu- 
fation of models or hypotheses, into a middle phase 
where these hypotheses are tested through experiment 
[0004] In the early phase of model building and hy- 
pothesis formation, the user engages in speculation and 
hypothesis formation, identifying key elements, genes 
and proteins in molecular biology, and possible interac- 
tions of those key elements . In this earty phase, the user 
is inferring causal relationships from correlations in test 
data, forming hypotheses which are to be refined and 
possibly tested. 

[0005] The user in the field of molecular biology faces 
a daunting task in this early phase of model building. 
Unlike earlier endeavors where the number of possible 
variables was small, and experiments few and con- 
tained, users in molecular biology deal with enormous 
problems of scope. 

[0006] Key elements, such as genes or proteins of in- 
terest, may number in the thousands, and the potential 
interactions may number in the billions. A single micro- 
array experiment may produce megabytes of numerical 
data. The data is too large in scope to be held in the 
user's head. 

[0007] To add to this problem, the user is faced with 
piecing together information from diverse sources and 
in different forms. This information is also geographical- 
ly diverse, both in content and form, and may include 
public and private databases, textual information from 
publications, and experimental data both raw and re- 
fined. This data is also at multiple levels of abstraction, 
ranging from raw numerical gene expression data from 
microarray experiments* to textual descriptions of cellu- 
lar processes. 

[0008] The user must synthesize information in vari- 
ous forms from various sources into high level models, 
when developing hypotheses and explanations. Often, 
there Is. a need to consider multiple hypotheses and al* 



ternative explanations in parallel. Moreover, users often 
work in teams, so there is a need to accommodate mul- 
tiple perspectives and different views of the same data. 
Further, hypothesis formulation is a "top-down" reason- 

s ing process, where as the exploratory analysis of de- 
tailed experimental data is a "bottom-up" process, in or- 
der to be effective in formulating hypotheses, the user 
needs to reconcile the "top-down" and "bottom-up" per- 
spectives, to ensure that the top-down" explanations 

10 are consistent with the actual experimental data. 

[0009] Very few tools exist tc support this abstraction 
and exploration process. What is needed is a system for 
assisting users in the organization, using, and sharing 
of this diverse biological information. 

15 

SUMMARY OF THE INVENTION 

[0010] The present invention provides a systemforor- 
ganizing information across external information objects 

50 which may include any and all of the following compo- 
nents: a results manager for viewing detailed experi- 
mental results; a story editor for providing a narrative 
structure for textually organizing information about in- 
teractions between items; a collection manager for cre- 

25 ating and manipulating collections of items representing 
external information objects; a diagram editor for incor- 
porating items, collections and interactions into a graph- 
ical representation of a story; and an object editor for 
adding or manipulating annotations to information within 

30 the system. 

[0011] Means for importing experimental data from 
external sources may be provided with the results man- 
ager. For biological applications, these external sources 
include, but are not limiled to DNA microarray experi- 

55 mental results, relative protein abundance measures 
derived from mass spectrometry and protein fragment 
data derived from gel electrophoresis experiments. 
[001 2] Multiple results manager viewers may be used 
simultaneously, for viewing and manipulating multiple 

4Q sets of data. 

[0013] The story editor component may also include 
means for importing information from external sources, 
in addition to the capability of allowing direct input there- 
to by the user The story editor may be further provide 

45 with means for importing items from the other compo- 
nents. Each of the components may be provided with 
the capability of importing from the other components. 
The components may be linked so that editing informa- 
tion within one component automatically updates the 

bo other components in the same way. 

[0014] The object editor is adapted to annotate an 
item or interaction with a textual description, Other com- 
ponents, such as the story editor, may also include 
means forannotating an item or interaction with a textual 

55 description. 

[001 5] The collection manager is adapted to group re- 
lated items together as a collection. Further, collections 
may be nested, i.e., a collection may contain one or 
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more other collections, in addition to single items. The 
collections may be free-form sets of items. The collec- 
tion manager may be provided with means for text-min- 
ing scientific literature to form collections. The collection 
manager may be adapted to semi-automatlcally import 
information and form collections. The collections may 
include links lo external information. 
[0016] The system may further include means for 
overlaying information from one or more components 
onto another component 

[0017] The diagram editor may include means for 
generating nodes corresponding to items and means for 
generating links between nodes which correspond to in- 
teractions. The diagram editor may include means for 
adding arbitrary nodes or links to the graphical organi- 
zation. 

[0018] The system may further be provided with 
means fortaggmg each annotation made with the name 
of a user who created it and with a time stamp indicating 
the time of creation of that annotation. The annotations 
may include text, data, pointers to external objects and/ 
or pointers to external data, for example. 
[0019] The system may further include means for 
generating a web repository, wherein the web repository 
includes a web page for each item. 
[0020] The system may further be provided with 
means for saving work in progress. 
[0021] The story editor may include a syntax-directed 
tree editor having means for identifying players to de- 
scribe entities that play an active role in a story de- 
scribed, and means for defining hypotheses about inter- 
actions between the players. 

[0022] Further, the story editor may include means for 
summarizing the story described as a Lheme, means lor 
defining alternative hypotheses describing possible al- 
ternative interactions between the players; anchor 
means for documenting supporting and opposing state- 
ments and/or citations in support of or in opposition to 
oneormore hypotheses, respectively 
[0023] The story editor may be provided with means 
for importing items from scientific text, graphical data or 
experimental data. 

[0024] A method of organizing information across ex- 
ternal information objects is described to include: im- 
porting information of diverse types from diverse sourc- 
es; organizing the information into concepts and cate- 
gories using a free-form database model; and formulat- 
i ng and documenting tentative explanations and hypoth- 
eses using the free-form database model. 
[0025] Further, the method may include the step of at- 
tach ing citations to the information by cutting and past- 
ing or dragging and dropping the citations. The citations 
may be selected from Web references, files, free-form 
text, and graphic elements, for example. 
[0026] A web repository of the organized information, 
explanations and hypotheses may be provided, for ac- 
cess by others. The method may further include incor- 
porating verification and feedback from others who ac- 



cess the organized information, explanations and hy- 
potheses and provide verification and feedback. 
[0027] Preferably, the systems and methods provided 
are for use in organizing biological information, but they 

5 are not limited thereto, and can be used for other infor- 
mational organisation applications. 
[0028] A free- form database model, embodied in soft- 
ware components, is provided, to include: items which 
represent external information objects; collections of 

10 items; textual stories describing the items, collections 
and interactions between the items, collections, and 
items and collections; and graphical stories describing 
the items, collections and interactions between the 
items, collections, and items and collections, 

15 [0029] The free-form database model may further be 
provided with means for saving and restoring work in 
progress, 

[0030] A method of verifying and validating experi- 
mental data is provided to include: importing the exper- 
20 imental data into a results manager; overlaying the val- 
ues of items-selected from the results manager onto a 
textual story provided in a story editor or onto a graphical 
story in a diagram editor; and comparing the overlaid 
items with the information in the textual story or graphi- 
cs ical story. 

[0031] The overlaying may be performed by selecting 
the cell in the results manager that corresponds to an 
experimental result for that item, for example. Both the 
diagram editor and the story editor have code that "lis- 
30 tens" lor column-selected events, which are fired when 
a cell in the table is selected. That "listener code then 
calJs the routines that do the overlaying automatically. 
[0032] A computer-readable medium carrying one or 
more sequences of instructions from a user of a com- 
as puter system userf or organizing information across ex- 
ternal information objects is provided, wherein the exe- 
cution of the one or more sequences of instructions by 
one or more processors cause the one or more proces- 
sors to perform the steps of; importing information of dl- 
40 verse types from diverse sources; organizing the infor- 
mation into concepts and categories using a free-form 
database model; and formulating and documenting ten- 
tative explanations and hypotheses using the free-form 
database model. 
45 [0033] The formulation and documentation may in- 
clude generating a story utilizing a story grammar and/ 
or generating a graphical story- 
[0034] A further step of attaching citations to the in- 
formation by cutting and pasting or dragging and drop- 
so ping the citations may be performed. 

[0035] Still further, a web repository of the organized 
information, explanations and hypotheses may be pro- 
vided for access by others. The step of incorporating 
verification and feedback from others who access the 
55 organized information, explanations and hypotheses 
and provide said verification and feedback may also be 
performed. 

[0036] The information is preferably, but not neces- 
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sarily, biological information. 

[0037] A computer-readable medium carrying one or 
more sequences of instructions from a user of a com- 
puter system user for organizing information across ex- 
ternal Information objects is provided, wherein the exe- 
cution of the one or more sequences of instructions by 
one or more processors cause the one or more proces- 
sors to perform the steps of: generating a results man- 
ager for importing and viewing detailed experimental re- 
sults as one type of representation of external informa- 
tion objects; generating a collection manager for creat- 
ing and manipulating collections of items representing 
external information objects; generating a story editor 
based on a narrative grammar for incorporating said 
items and collections into the narrative grammar to form 
a story; generating a diagram editor for incorporating 
items, collections and interactions into a graphical rep- 
resentation of a story; and generating an object editor 
for adding or manipulating annotations to information 
within the system. 

[0038] These and other objects, advantages, and fea- 
tures of the invention will become apparent to those per- 
sons skilled in the art upon reading the details of the 
systems, methods and tools as more fully described be- 
low. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0039] The present invention is described with re- 
spect to particular exemplary embodiments thereof and 
reference is made to the drawings in which: 
[0040] Fig. 1 shows examples of main windows of the 
p resent i n vention ; 

[0041] Fig. 2 shows an Object EdiLor for an item ac- 
cording to the present invention; 
[0042] Fig. 3 shows a File menu according to the 
present invention, 

[0043] Fig. 4 shows a Results Manager window ac- 
cording to the present invention; 
[0044] Fig, 5 shows a Collection Manager window ac- 
cording to the present invention; 
[0045] Fig. 6 shows a Collection Manager menu ac- 
cording to the present invention; 
[0046] Fig. 7 shows a Web browser view of a story 
according to the present invention; 
[0047] Fig. 8 shows a story in tree form, in a Story 
Editor according to the present invention; 
[0048] Fig. 9 shows a story grammar according 1o the 
present invention; 

[0049] Fig. 1 0 shows a generated Web page for an 
item according to the present invention; 
[0050] Fig- 11 shows a Diagram Editor window ac- 
cording to the present invention; and 
[0051] Fig. 12 shows a Tools menu according to the 
present invention. 



DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 

[0052] Before the present system, tools and methods 
5 are described, It is to be understood that this Invention 
is not limited to particular viewers, tools, commands or 
steps described, as such may t of course, vary, it is also 
to be understood thai the terminology used herein is for 
the purpose of describing particular embodiments only, 
f o and is not inten ded to be I im iti ng , s ince th e scope of the 
present invention will be limited only by the appended 
claims. 

[0053] Where a range of values is provided, it is un- 
derstood that each Intervening value, to the tenth of the 

15 unit of the lower limit unless the context clearly dictates 
otherwise, between the upper and lower limits of that 
range ts also specifically disclosed. Each smaller range 
between any stated value or intervening vaJue in a stat- 
ed range and any other stated or Intervening value in 

20 that stated range is encompassed within the invention- 
The upper and tower limits of these smaller ranges may 
independently be included or excluded in the range, and 
each range where either, neither or both limits are in- 
cluded inthe smaller ranges is also encompassed within 

2S the invention, subject to any specifically excluded limit 
in the stated range. Where the stated range includes 
one or both of the limits, ranges excluding either or both 
of those included limits are also included in the inven- 
tion. 

30 [0054] Unless defined otherwise, all technical and sci- 
entific terms used herein have the same meaning as 
commonly understood by one of ordinary skill in the art 
to which this invention belongs. Although any methods 
and materials similar or equivalent to those described 

35 herein can be used in the practice or testing of the 
present invention, the preferred methods and materials 
are now described. All publications mentioned herein 
are incorporated herein by reference to disclose and de- 
scribe the methods and/or materials in connection with 

*o which the publications are cited. 

[0055] It must be noted that as used herein and in the 
appended claims, the singular forms "a*, "and", and 
"the" include piural referents unless the context clearly 
dictates otherwise. Thus, for example, reference to "a 

« viewer" includes a plurality of such viewers and refer- 
ence to "the data set" includes reference to one or more 
data sets and equivalents thereof known to those skilied 
in the art, and so forth. 

[0056] The publications discussed herein are provid- 
so ed solely for their disclosure prior to the filing date of the 
present application. Nothing herein is to be construed 
as an admission that the present invention is not entitled 
to antedate such publication by virtue of prior invention. 
Further, the dates of publication provided may be differ- 
55 ent from the actual publication dates which may need to 
be independently confirmed. 
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DEFINITIONS 

[0057] The term "activation" refers to enhancement of 
the effects of a biological agent or stimulation of a bio- 
logical or chemical process, for example, 
[0058] The term "alternative" when used in ihe con- 
text of describing a biological story, refers to one choice 
among a number of possible explanations (or hypothe- 
ses) for a bioiogical phenomenon, 
[0059] The term "amino acid" refers to a molecular 
sub-unit of a protein, containing an amino group, car- 
boxyl group, and side chain attached to a carbon atom. 
[0060] The term * analysis" is used herein to refer to a 
separation of a material or abstract entity into its con- 
stituent elements, as a method of studying its nature or 
determining its essential features. 
[0061] The term "annotation" is used herein to refer 
to an explanatory or critical note that may be associated 
with any item, collection, story element, diagram node, 
or diagram interaction. 

[0062] The term "biological stop/' defines a high-level 
description or explanation of a complex biological proc- 
ess, formulated by a researcher, for example, the "story" 
of how a mutation in a gene may Lead to a cascade of 
events leading to a form of cancer 
[0063] The term "bottom-up analysis" refers to an in- 
ductive process of inferring patterns, concepts, and oth- 
er higher-level information, beginning from detailed, 
constituent data. 

[0064] The term ^canvas" is used to describe a user 
interface component, typically in a graphical or textual 
editor, upon which a user can enter information, such as 
sketches or notes. 

[0065] The term "celt*, when used in the context de- 
scribing a data table, refers to the data value at the in- 
tersection of a row and column in a spreadsheet-like da- 
ta structure; typically a property/value parr for an entity 
in the spreadsheet, e.g. the expression level for a gene, 
[0066] The term "cell cycle" refers to the biological 
process and phases of division and proliferation of a liv- 
ing cell. 

[0067] The term "celt localization" refers to the loca- 
tion in a cell where a given biological entity, such as a 
protein, is concentrated, e.g. the plasma membrane, cy- 
tosol, nucleus, or organelles, 

[0068] A "citation" is a quotation from or reference tD 
an authority. 

[0069] The term "collection" refers to free-form group- 
ings or sets of related information. Collections can also 
be called or thought of as "categories" or "concepts". 
[0070] The term "Collection Manager* defines a soft- 
ware component and user interface for viewing and ma- 
nipulating collections. 

[0071] "Color coding" refers to a software technique 
which maps a numerical or categorical value to a color 
value, for example representing high leveJs of gene ex- 
pression as a reddish color and low levels of gene ex- 
pression as greenish colors, with varying shade/inten- 



sities of these colors representing varying degrees of 
expression. 

[0072] "Copyinpycutting and pasting" refers to a user 
interface technique for moving or copying a data item 

5 from one view to another. A typical mechanism for cop- 
ying and pasting is to (1 ) select the data item to be cut/ 
copied, (2) perform cut/copy operation, either via a 
menu or via keyboard sequence, such as Cntl-X, (3) se- 
lect data item into which the moved/copied data item is 

10 to be incorporated, and (4) perform paste operation, ei- 
ther via a menu or via keyboard sequence, such as Cntl- 

[0073] The term "data mining" refers to a computa- 
tional process of extracting higher-level knowledge from 

is patterns of data in a database. Data mining is also 
sometimes referred to as * know] edge discovery". 
[0074] The term "Diagram Editor" refers to a software 
component for presenting and manipulating biological 
process diagrams, such as signal transduction path- 

20 ways and protein/protein interaction maps. A Diagram 
Editor can be thought of as a graphical mechanism for 
putting together a biological story. More generally, a Di- 
agram Editor can be used to present and manipulate 
process diagrams outside of the biological realm, 

25 p)075] The term "diagram interaction" refers to the 
representation, in the Diagram Editor, of a process or 
relationship involving two or more biological entities in 
the case of a biological diagram, e.g. a protein/protein 
binding interaction or a protein/gene inhibitory interac- 

30 tion. More generally, diagram interaction refers to the 
representation of the process or relationship between 
two or more entities in a diagram by the Diagram Editor 
[0076] A "diagram node" or "node", is the representa- 
tion, in the Diagram Editor, of a specific item, collection, 

35 or Player 

[0077] The term "dragging and dropping" refers to a 
user interface technique for moving or copying a data 
item from one view to another. A typical mechanism for 
dragging and dropping is to (1) select the data item to 

40 be cut/copied or moved, (2) while holding down the 
mouse button, move the mouse sprite over to the data 
item into which the moved/copied data item is to be in- 
corporated, and (3) release the mouse button when 
mouse sprite is over the data item into which the moved/ 

45 copied data item is to be incorporated. Holding down the 
Cntl- key when mouse button is depressed results in 
copying of the source item; otherwise, the source item 
is moved out of source position and into destination. 
[0078] A "drop point" is a location where the mouse 

so button is released during a drag/drop operation, 

[0079] A "file chooser" is a user interface component 
for navigating a directory/folder tree and selecting a file 
desired for an operation, which is based upon file navi- 
gation mechanisms in Microsoft Windows and Apple 

55 Maci ntosh operating syste ms . 

[0080] A "file header 1 * Is an auxiliary information pre- 
pended to a data file, typically used to define fields, val- 
uetypes, and other structural information about the data 
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In the file; for example, specifying whether data in a par- 
ticular column is to be treated as text or as a numerical 
value. 

[0081] A "file menu" ts a user- interface mechanism for 
choosing one of a number of possible file-related oper- s 
aliens, e.g. importing a gene expression data set. 
[0082] A Tree-form data model" refers to a model for 
data representation and storage which , in contrast to a 
forma!, fixed database model, allows for the entry of ar- 
bitrary data before the def in it ton of database tables. This « 
allows the user to "add data now, categorize later". 
[0083] The term "differentiation" refers to a process 
by which unspecialized cells acquire specialized struc- 
tural and functional properties, 

[0034] The term "down- regulation" is used in the con- f 5 
text of gene expression, and refers to a decrease in the 
amount of messenger RNA (mRMA) formed by expres- 
sion of a gene, with respect to a control. 
[0085] The term "Explanations" is used to refer to a 
set of assumptions, clarifications, and hypotheses that so 
constitute the "plot" of a biological story. 
[0086] "Get electrophoresis" refers to a biological 
technique for separating and measuring amounts of pro- 
tein fragments in a sample. Migration of a protein frag- 
ment across a gel is proportional to its mass and charge. & 
Differentfragmentsof proteins, prepared with stains, will 
accumulate on different segments of the gel. Relative 
abundance of the protein fragment is proportional to the 
intensity of the stain at its location on the gel. 
[0087] The term J 'gene" refers to a unit of hereditary so 
information, which is a portion of DNA containing infor- 
mation required to determine a protein's amino acid se- 
quence. 

[0088] "Gene expression" refers to the level to which 

a gene ts transcribed to form messenger RNA mole- 3$ 

cules, prior to protein synthesis. 

[0089] "Gene expression ratio" is a relative measure- 
ment of gene expression, wherein the expression level 
of a test sample is compared to the expression level of 
a reference sample, 40 
[0090] A "gene product" is a biological entity that can 
be formed from a gene, e.g. a messenger RNA or a pro- 
tein. 

[0091] A "growth factor" refers to one of a group of 
peptides that is highly effective in stimulating cell drvi- 
sion and/or differentiation of certain cell types. 
[0092] A "heat map" is a visual representation of a tab- 
ular data structure of gene expression values, wherein 
color codings are used for displaying numerical values. 
The numerical value for each cell in the data table is so 
encoded into a color for the ceil. Color encodings run on 
a continuum from one color through another, e.g. green 
to red or yellow to blue for gene expression values. The 
resultant color matrix of afl rows and columns in the data 
set forms the color map ; often referred to as a "heat & 
map" by way of analogy to modeling of thermodynamic 
data. 

[0093] "HTML" or "HyperText Markup Language" re- 



fers to a system of standards used to tag the elements 
of a hypertext document; and is a standard for docu- 
ments on the World Wide Web. 
[0094] A "hypothesis" refers to a provisional theory or 
assumption set forth to explain some class of phenom- 
enon, 

[0095] "Hypertext" refers to data, as text, graphics, 
video, or sound, stored in a computerized document so 
that a user can move non -sequentially through a link 
from one document to another 

[0096] The term "Import Bio Data" refers to a user in- 
terface operation for bringing detailed experimental data 
into the software system, 

[0097] An "Index Web page" is a Web page that con- 
sists of links to other Web pages, eg. the index of all 
Collections in the current model. 
[0098] The term "inhibit" refers to a decrease in the 
effects of a biological agent or a biological process. 
[0099] Theterm "interaction" refers to a process or re- 
lationship involving two or more entities, e.g., biological 
entities such as a protein/protein binding interaction or 
a protein/gene inhibitory interaction. 
[0100] The term "Issue Based Information System" 
refers to a ciass of computer software systems that pro- 
vide an explicit data representation, usually in diagram- 
matic form, of the issues, positions, and arguments gen- 
erated during a group deliberation. An issue based in- 
formation system helps workgroups to document their 
lines of reasoning in coming to decisions on complex 
problems. 

[01 01 ] An "item" refers to a data structure that repre- 
sents a biological entity or other entity. An item is the 
basic "atomic" unit of information in the software system. 
[0102] The term "kinase" refers to an enzyme in- 
volved in signal transduction , typically by transferring a 
phosphate to another mofecuie. 

[0103] The term "knowledge representation" refers to 
computational methods and data structures for encod- 
ing and storing real world knowledge, which may include 
a set of objects and the relationships between them, for 
example. Relationships are often defined by rules. 
[0104] A "memory indexing structure" is a theoretical 
concept describing how the human brain may store 
memories and arrange them in orderto facilitate subse- 
quent retrieval. 

[0105] The term "mass spectrometry" refers to a set 
oftechniquesformeasuring the mass and charge of ma- 
terials such as protein fragments, for example, such as 
by gathering data on trajectories of the materials/frag- 
ments through a measurement chamber. Mass spec- 
trometry is particularly useful for measuring the compo- 
sition (and/or relative abundance) of proteins and pep- 
tides in a sampie. 

[0106] A "mlcroarray" or "DNA microarray w is a high- 
throughput hybridization technology that allows biolo- 
gists to probe the activities of thousands of genes under 
diverse experimental conditions. Microarrays function 
by selective binding (hybridization) of probe DNA se- 
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quences on a mjeroarray chip to fluorescently-tagged 
messenger RNA fragments from a biological sample. 
The amount of fluorescence detected at a probe position 
can bean indicator of the relative expression of the gene 
bound by that probe. 

[0107] A "model", as used herein, refers to a data 
structure that contains all items, collections, and textual 
and graphical elements in a biological story; the compu- 
ter representation of all data in Results Managers, Col- 
lection Manager, Story Editor, and Diagram Editor. 
[0108] A "mouse sprite" refers to a displayed pointer 
on a computer screen, which corresponds to the move- 
ment of a mouse input to a graphical user interface. 
[0109] A "naming convention 11 is a mutually agreed 
upon set of rules for naming of fields in experimental 
data sets. 

[0110] A "narrative structure" refers to the underlying 
structure of a biological story, i.e. its partitioning of infor- 
mation into Theme, Player, and Explanation compo- 
nents; also the way in which many cognitive psycholo- 
gists believe the human brain represents stories. 
[0111] The term "Object Editor" refers to a software 
component for presenting, manipulating, and annotat- 
ing the properties of items, collections, story nodes, and 
diagram nodes. 

[0112] An "oncogene" refers to an altered gene that 
can lead to cancer, 

[0113J An "oppose node" refers to an element in the 
Story Editor that can be used to document information 
and/or citations that dispute a claim made in a particular 
story node. 

[0114] An "outline processor Is asoftware tool for tex- 
tually building up an outline of a document, for example, 
the Outline View in Microsoft Word. 
[0115] A "pathway" ref e ns to a seq u e nee of p rocesses 
or mechanisms, such as biological processes or mech- 
anisms that relay information between and within cells 
anchor produce biological products via biochemical re- 
actions, 

[0116] The term "Pathway Diagram" refers to a dia- 
grammatic representation of a pathway, e.g., a biologi- 
cal pathway. 

[01 17] The term "pept'a! e bond" refers to a polar cov- 
alent chemical bond joining two amino acids. Peptide 
bonds form the protein backbone. 
[01 1 8] "Persistent storage"* refers to a computer me- 
dium for storing and retrieving data. Persistent storage 
typically can facilitated in a file or database. 
[01 1 9] A "Player" refers to an entity that plays an ac- 
tive role in a story; in the biological realm, a ptayer is a 
biological entity that plays an active role in a biological 
story, e.g- a gene or protein that participates in a signal 
transduction pathway, 

[0120] A "polymer is a large molecule formed by link- 
ing together of smaller similar sub-units or "mere". 
[0121] A "probe" (in a DNA microarray) refers to a 
DN A sequence that selectively binds (hybridizes) to par- 
ticular DNA sequences in a biological sample, thus pro- 



viding a measure of the relative expression level of a 
gene sequence of interest. 

[0122] The term "promote 0 refers to an increase of the 
effects of a biological agent or a biological process. 
5 [0123] A "protein" ts a large polymer having one or 
more sequences of amino acid subunits joined by pep- 
tide bonds. 

[0124] The term "protein abundance" refers to a 
measure of the amount of protein in a sample; often 
10 done as a relative abundance measure vs. a reference 
sample. 

[0125] "Protein/DIMA interaction" refers to a biological 
process wherein a protein regulates the expression of 
a gene, commonly by binding to promoter or inhibitor 
/5 regions. 

[0126] "Protein/Protein interaction 11 refers to a biolog- 
ical process whereby two or more proteins bind together 
and form complexes, 

[0127] "Publish to Web" refers to a system facility for 
20 generating an interlinked set of HTML pages, where 
each item t each collection, and each element of a col- 
lection has its own Web page. This facility is useful for 
sharing a model with colleagues who are not using the 
present software system, since only a Web browser is 
25 required for viewing and navigating the information pub- 
lished 

[0128] A "Results Manager" refers to a software com- 
ponent and user interface for viewing and manipulating 
items, 

30 [0129] A "sequence" refers to an ordered set of amino 
acids forming the backbone of a p rotein or of the nucleic 
acids forming the backbone of a gene. 
[0130] The term "semantic overlay" or rt data overlay* 
refers to a user interface technique for superimposing 

55 data from one view upon data in a different view; for ex- 
ample, overlaying gene expression ratios on top of dia- 
gram nodes in the Diagram Editor This technique is 
useful for informally validating high-level explanations 
and hypotheses against detailed experimental data. 

40 [0131] The term "signal transduction" refers to the re- 
lay of information from receptors In the cell membrane 
to the cell's response mechanism; the process by which 
stimulus energy is transformed into a response. 
[0132] A "spreadsheer is an outsi?e ledger sheet 

45 simulated electronically by a computer software appli- 
cation; used frequently to represent tabular data struc- 
tures. 

[0133] The term "Story Editor" refers to a software 
component for presenting and manipulating elements of 

so a biological story, such as Players, Alternatives, and Ex- 
planations. The Story Editor can be thought of as a tex- 
tual mechanism for putting together a biological story 
[0134] A "story grammar" refers to a set of formal rules 
for organizing and interrelating the elements of a biolog- 

55 ical story; derived from research in cognitive psychology 
into story grammars as a way of structuring information 
in stories; related to forming memory indexing struc- 
tures. 
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101 35] A "story node" refers to an element in the Story 
Editor, e.g. a Theme, Player, Interaction, Alternative, 
etc. 

[0136] The term "story structure" refers to the manner 
in which elements of a blologicaJ story are organized and 5 
interrelated 

[0137] A "support node" when used in the context of 
the Story Editor is an element that can be used to doc- 
ument information and/or citations that su pportf rein- 
force a claim made in a particular story node, to 
[0136] The term "synthesis" refers to the combining 
of elements into a single or unified entity, 
[0139] The term "syntax-directed editor" refers to a 
software tool for editing a document wherein the Infor- 
mation added is constrained by grammatical rules. A 15 
syntax-directed editor is useful in helping a user struc- 
ture a document for subsequent ease of reuse of the 
information. An example of a syntax-directed editor is 
the Story Editor. 

[0140] The term "text mining" refers to a computation- so 
al process of extracting higher-level knowledge from 
patterns of text in a document, 
[0141] A "Theme" refers to a brief description of the 
overall gist of a biological story, such as might appear 
in the abstract of a journal article. 25 
[0142] A Jl time course" refers to a series of measure- 
ments of a biological phenomenon taken over defined 
intervals of time, e.g. measurements of gene expression 
levels over 1 , 3, 24, 48 hours in response to a treatment 
of a cell sample, such as exposure to ultraviolet fight 30 
[01 43] The term 'lime stamp" refers tD a data field that 
represents the date and time that an annotation was 
made or a citation added to the system. A time stamp is 
stored by the system whenever an annotation is made 
or a citation is added, and es useful in tracking changes 35 
made by members of a work group. 
[0144] The term "top-down hypothesis formulation" 
refers to the deductive process of deriving a high-level 
explanation or hypothesis, beginning with a mental 
model of a process and utilizing concepts and patterns 40 
inferred by "bottom up" data analysts. 
[0145] The term "tools menu" refers to a user-inter- 
face mechanism for choosing one of a number of pos- 
sible auxiliary operations, e.g. Publish to Web. 
[01 46] A "tree" is a hierarchical data structure and vis- 45 
ualization in which nested levels of information are rep- 
resented as branches and leaves of a tree. The Collec- 
tion Manager and Story Editor both represent their data 
as trees. 

[0147] The term "up-regulation", when used to de- so 
scribe gene expression, refers to an increase in the 
amount of messenger RNA (mRNA) formed by expres- 
sion of a gene, with respect to a control. 
[0148] The term "UniGene" refers to an experimental 
database system which automatically partitions DNA 55 
sequences into a non-redundant sets of gene-oriented 
clusters. Each UniGene cluster contains sequences that 
represent a unique gene, as well as related Information 



such as the tissue types in which the gene has been 
expressed and chromosome location, 
[0149] The term "URL" or "Uniform Resource Locator 
refers to a protocol for specifying addresses on the In- 
ternet used for locating resources such as Web pages. 
[0150] A "Web page" refers to a single hypertext doc- 
ument, typically resident on the World Wide Web, that 
can incorporate text, graphics, sound, etc. 
[0151] The "World Wide Web" Js a system of exten- 
sively interlinked hypertext documents; a branch of the 
Internet. 

[0152} The term "view" refers to a graphicai presen- 
tation of a single visual perspective on a data set, for 
example a spreadsheet or tree diagram. 
[0153] The term "visualization" or "information visual- 
nation" refers to an approach to exploratory data anal- 
ysis that employs a variety of techniques which utilize 
human perception; techniques include graphical pres- 
entation of large amounts of data and facilities for inter- 
actively manipulating and exploring the data.The term 
"XML" or "Extended Markup Language" refers to a 
World Wide Web standard, derived from HTML, for rep- 
resenting structured information in hypertext docu- 
ments. XML extends HTML in that documents are rep- 
resented as rich tree structures; typically used for stor- 
ing and transmitting data, rather than textual docu- 
ments, between computer systems, 
[0154] Biomedical researchers are inundated by data 
which exists in a myriad of forms and from a myriad of 
sources. From this vast amount of data, the researchers 
are faced with an unenviable task of culling meaningful 
data from a vast amount of "noise" or data which is not 
pertlnentto the task at hand. Put another way research- 
ers seek to find needles of causality in haystacks of cor- 
relation. 

[0155] To make the data meaningfui and useful, re- 
searchers endeavor to construct a working explanation 
or "story" of what a gene or protein or other entity does, 
and how it interacts in pathways with other genes or pro- 
teins and their products, or in other chemical reactions 
or dynamic processes. For example, a story might por- 
tray a cascading set of proposed causal relationships 
between gene expression states. A specific example of 
this is a "biological story" buift up by a team of biomedical 
researchers studying the infJjence of an oncogene 
(cancer-related gene) on a rare form of cancer [cite 
Khan et al, PNAS], The researchers have run a number 
of experiments, using DNA microanrays to probe the in- 
fluence of the PAX3-FHKR oncogene on thousands of 
genes under diverse experimental conditions. They 
have identified a number of affected genes, such as My- 
ogenin and MyoD, which in turn may be playing influen- 
tial roles in the cancer process. The researchers believe 
a cascade of activation events, initiated by the 
PAX3-FHKR oncogene, results in a pediatric muscie 
cancer, known as alveolar rhabdomyosarcoma 
(ARMS). The experimental data indicates that 
PAX3-FHKR directly induces (activates) the genes My- 
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ogenin and MyoD, and, through the actions of these two 
genes, induces the gene My1 4, a gene that is known to 
be associated with muscle cell growth and differentia- 
tioa The perturbation on the effects of My 14 results in 
a failure of the muscle cells to differentiate and end the 
cell cycle, The failure oFlhe muscle cells lo exit the cell 
cycle results in cells prolfferaling in an uncontrolled 
manner (i.e. cancer). 

[0156] The present invention provides tools arid 
methods for constructing a story through iterative and 
interactive processes which may include any combina- 
tion or all of the following: gathering information; organ- 
izing information into concepts and categories; formu- 
lating and documenting tentative explanations and hy- 
potheses; documenting explanations and hypotheses 
via textual notes and graphical sketches; sharing expla- 
nations and hypotheses with colleagues; and incorpo- 
rating verification and feedback from colleagues into the 
story. 

[0157] To support these processes, the system ac- 
cording to the present invention provides a coordinated 
set of interactive information organization and synthesis 
tools, built upon a simple conceptual model using a free- 
form database and a narrative structure, incorporating 
and building items, collections, and biological stories. 
[0158] Narrative structure is used based on findings 
in cognitive psychology and knowledge representation 
literature that people use story structure as a way of or- 
ganizing and remembering information and that story 
creation is a fundamental process for constructing mem- 
ory indexing structures, see for example, Thorndyke, P. 
W., "Cognitive Structures in Comprehension and Mem- 
ory of Narrative Discourses", Cognitive Psychology, 9, 
1977, pp, 77-110; and Scharik, R, Tell MeaSlory: Nar- 
rative and Intelligence", Northwestern University Press, 
1990; both of which are incorporated herein in their en- 
tireties, by reference thereto. The present invention ap- 
plies a story grammar as a framework for organizing and 
indexing biological stories. 

[01 59] Th e f ree-f o rm database mode] en abiesthe us- 
er to more easily build up and evolve the information 
structure that supports a biological story. The strength 
of a free-form database model is that the entry of data 
can precede the creation of database tables; the user 
can "add data now and categorize later". The free-form 
model is the central data structure of the software sys- 
tem; it encompasses all the information including exper- 
imental data, annotations, categorization, and textual 
and graphical explanations of biological processes. 
Models can be saved and restored and a group of users 
can work with multiple models, 
[0160] Ftg. 1 shows examples of main windows of a 
system according to the present invention. The system 
may be built as a Java program to obtain portability 
across operating systems. Web and XML technology 
are used to represent and store information in a flexible 
fashion. While the implementation shown herein focus- 
es on genes and gene expression, the techniques dis- 



closed are equally useful for other biological data and 
problem areas, such as protein abundance, cell locali- 
zation, protein/protein interactions, and protein/DNA in- 
teractions. Likewise, the techniques could be applied to 
5 other domains with problems concerning large numbers 
of interacting elements, e.g. the management of com- 
plex telecommunications networks. 
[0161] The main windows shown include; a Results 
Manager 20 for viewing detailed experimental results; a 
io Collection Manager 30 for organizing experimental re- 
sults and other information into groups and categories; 
a Story Editor 40, which provides a narrative structure 
for textually organizing information about the interrela- 
tionships and interactions amongst items and codec- 
's tions in biological processes, and a Diagram Editor 50, 
for graphically organizing information about the interre- 
lationships and interactions amongst items and collec- 
tions in biological processes. The Diagram Editor 50 al- 
so allows the construction of semantic overlays for val- 
20 idating high-level explanations against experimental re- 
sults. An Object Editor 60 (Fig. 2) is provided for editing 
and annotating the properties and contents of Etems and 
collections. 

[0162] Each window in Fig. 1 represents a different 

25 view into the overall model. These views and their as- 
sociated data structures are closefy and consistently 
coupled. An interactive change to an entity in any one 
view is reflected in all other views via a graphical user 
interface technique known as the Model/View/Controller 

30 paradigm, which is a specific type of event driven pro- 
gramming which may be carried out using the JAVA pro- 
gramming language, for example. 
[0163] Modei/View/Controller is a fundamental ob- 
ject-oriented programming paradigm which separates 

£5 the actual data (represented by the view of the data) 
from the view of the data. The object (data structure) 
that represents the data has procedures that signal an 
event whenever the data is changed in any way, such 
as by deletion of data, addition of data, or modification 

40 of existing data, for example. By signaling an event, a 
message is sent indicating that the data has been 
changed. 

[0164] The "Controller" aspect of the programming is 
implemented as a JAVA execution environment. A "lis- 

45 tener" (a "listener* is a readily available JAVA construct) 
is defined and implemented by each view (e.g., results 
manager collection manager, story editor, diagram ed- 
itor, etc.) which registers with the controller to indicate 
that the viewer that is associated with each respective 

so listener is interested in hearing about, or being notified 
when an event is signaled to indicate that data has been 
changed. The role of the controller is to coordinate the 
ffow of events to listeners. When a listener receives a 
message (Le. , event) issued with regard to a change in 

5£ data, it initiates procedures, which are specifically de- 
fined with respect to each viewer as to what action to 
take when that particular message has been received. 
Thus, code that is specific to each viewer is executed 
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substantially simultaneously to make changes to each 
view that represent the same change that was made to 
the data. 

[0165] For example, a user may change the name of 
a collection in a collection manager. Assuming that this 
collection has already been added as a Player in the 
Story Editor prior to the user's change in the collection 
name, then a listener for the story editor receives the 
event that is generated when the collection name is 
changed. That listener then Initiates execution of the 
procedures associated with the story editor which im- 
mediately make the collection name change in the story 
editor view. To the user, it appears that the collection 
name changes immediately, simultaneously with the 
change in the collection manager as the user manually 
makes the change in the collection manager. 
[0166] Consistency and close coupling of multiple 
views enables the user to simultaneously view Informa- 
tion from a variety of perspectives and across different 
levels of abstraction. This facilitates the discovery of un- 
foreseen interrelationships, this aiding the process of 
piecing together explanations and hypotheses. 

ITEMS 

[0167] Items are the basic 'atomic*' unit of information. 
They represent biological entities such as genes, pro- 
teins, sequences ! and other gene products, or other en- 
tities in the case of a non-biologicat application of the 
system, such as network nodes or probes, for example. 
Items may contain detailed information about a biologi- 
cal entity, such as the quantitative results from an ex- 
perimental assay. The user can create items by Import- 
ing an experimental data sel into the system. The user 
can import an experimental data set into a Results Man- 
ager 20 via the Import Bio Data item 1 2 on the File Menu 
1 0 (see Fig, 3), Selecting the Import Bio Data menu item 
12 results in a prompt for a file to import, via affile choos- 
er" dialog, which is similar in operation to the file chooser 
dialog in Microsoft Windows Explorer. The Import Bio 
Data operation imports a set of experimental data, such 
as gene expression data. Data is imported in the form 
of a spreadsheet with tab-separated columns. Each row 
of the spreadsheet data is read and used to create a 
new item that is added to the Results Manager 20. Prop- 
erties and values are assigned to each item based upon 
the information imported from the appropriate columns, 
[0168] In order to correctly make assignments to 
items and their data values, the program relies upon 
auxiliary fife header information and conventions on how 
columns are named. The naming conventions in the cur- 
rent invention are specified in succeeding paragraphs. 
While the current invention supports naming conven- 
tions for gene expression data from microarray experi- 
ments, the import mechanism is generalized in principle 
and naming conventions can be defined to support im- 
port from other data sources, such as mass spectrom- 
etry data, or telecommunications data, for example. 



[0169] The imported data files must have two addi- 
tional "header" lines pre-pended to the actual data; 
[01 70] # gene data version 1 .1 
[0171] # unigene-id<tab>gene-name<tabxfor- 
3 mat><col>-<name><tab> . . . 

[0172] Where <formal> is one of: 

• double-specifies that this column represents a 
Double value. This value will not be considered an 

10 experimental result (will not show up as a colored 
cell in the Results Manager 20 that encodes an ex- 
perimental result, nor will it be used in any semantic 
overlays). 

• int— specifies that this column represents an Integer 
T5 value. This value will not be considered an experi- 
mental result [will not show up as a colored cell in 
the Results Manager 20 that encodes an experi- 
mental result nor will rt be used in any semantic 
overlays). 

20 * text-specifies that this column represents a text 
value. Afl text up to the next \t (tab) or end of line is 
read and considered part of the text value. This val- 
uewill not be considered an experimental result (will 
not show up as a colored cell in the Results Man- 

» ager 20 that encodes an experimental result, nor 
will it be used in any semantic overlays). 

• data- sp ecif ies th at th is co I um n rep res e nts a Do u - 
ble value. This value will be considered an experi- 
mental result and will be shown as a colored cell in 

50 Results Manager 20 and also used for color encod- 
ings in overlays. 

[0173] <col> specifies the column where this data 
should be initially presented in the Results Manager 20, 
35 <name> specifies the actual name of the column. 

[0174] 'unigene-id' is the header for the field that 
specifies the identifier in the Unigene database for the 
item 

and 'gene name' is the header for the field that specifies 
40 the name of the item. For example, 

[0175] # unigene-id gene-name data-1-UACC75 da- 
ta-2-UACCS9 

[0176] Mismatched double quotes, single quotes, and 
extra ending white space are removed from names. 

45 [0177] In the present invention, the software fills in, 
for each imported item with a Unigene-id field, a URL 
for the Unigene entry for that item, which can be tra- 
versed from within the Object Editor 60 for that item. 
[0178] When a new data set is imported, the default 

50 operation is to add the new data to any existing data, so 
this may result in a duplication of items. The existing 
data set may be cleared by selecting the File => Clear 
out BioGrapher menu item 14. 
[0179] The upper-right pane in Fig. 1 contains a Re- 

S5 suits Manager 20 having a viewer (Results:Genes) for 
a data set of items. The Results Manager 20 is also 
shown in Fig. 4, In the example in Fig. 1, the data is 
drawn from several DNAmk=roarray experiments. How- 
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ever, the data can be imported from a variety of exper- 
imental sources, for example relative protein abun- 
dance measures derived from mass spectrometry. Also, 
there can be multiple Results Manager 20 panes resi- 
dent In the system at any time. * 
[0180] In the ResulLs Manager 20 in Fig. 4, each row 
represents an individual item, such as a gene or protein. 
Each column represents an attribute of the item. An at- 
tribute of an item can be a properly, such as its name, 
or an experimental condition, e.g. a therapeutic treat- *0 
ment or a tissue sample. Each cell in the Results Man- 
ager 20 (i.e. each row/column Intersection) represents 
a value for that attribute of the item. In the leftmost col- 
umns In the Results Manager 20 of Figure 4, that value 
is a gene expression ratio. This ratio is a measure of the rs 
degree to which a gene is differentially expressed (or 
"turned on") in an experimental sample (versus a refer- 
ence sample). For example, one might use DNA micro- 
arrays to measure expression levels of many thousands 
of genes across a set of different tumor tissues, con- 20 
trasiing each with gene expression levels for normal tis- 
sue. Many bio informatics tools and databases store 
gene expression data in this form, so it is relatively 
straightforward to import gene expression data into the 
software. In this example, expression ratios 22 are rep- 25 
resented by a color encoding which runs from green 22g 
(highly down-regulated) to red 22r (highly up-regulated). 
The Results Manager 20 may be sorted, using the val- 
ues of any column as the sort key (not shown), by click- 
ing on the column heading. The sort key is an internal 
construct used by the software, rather than an entity dis- 
played in the user interface. 

[0181] Items also serve as repositories for links to 
public data, such as literature citations. The user can 
move Web-based information fora gene into the item & 
representing that gene by dragging and dropping (or 
copying and pasting) text and URLs from a Web page 
(e.g., an NCBI Genbank entry for a gene) onto the ap- 
propriate item t In addition to providing ways for the user 
to manually enter links to items, the system can also *o 
semi- automatically populate items with links to detailed 
data. For example, knowJedge discovery and data min- 
ing tools can be utilized to retrieve pertinent literature 
references and database entries for an item. Further ex- 
amples of knowledge discovery and data mining tools 45 
can be found in commonly owned, co-pending Applica- 
tion (Application Serial No., not yet assigned; Attorney's 
Docket No. 10020142-1) filed concurrently herewith and 
titled •'Biotechnology Information Naming System", and 
in commonly owned, co-pending Application No. so 
1 0/033 T 823, filed December 1 9, 2001 and titled "Domain 
Specific Knowledge-Based Metasearcri System and 
Methods of Using 1 ', both of which are incorporated here- 
in, in their entireties, by reference thereto. 

55 

COLLECTIONS 

[0182] In order to build new abstractions, it is often 



useful for the user to group together chunks of related 
information. For example, a set of genes known to influ- 
ence muscle cell differentiation may be thought of, ma- 
nipulated, and annotated together as a single group or 
"concept". Forexample, proteins which all belongto the 
same family, e.g. growth factors, might for purposes of 
efficiency or convenience be thought of, manipulated, 
and annotated as a single group, rather than as individ- 
ual proteins. The system supports these groupings 
through constructs known as collections. Collections 
are free-form sets of items. Collections are typicalfy us- 
er-created, but can also be programmatically created, 
e.g. from the results of text mining. 
[0183] The user can group items into collections by 
dragging and dropping items from the Results Manager 
20 onto the desired collection in the Collection Manager 
60. Fig. 5 shows a Collection Manager window 62, which 
displays a tree view of collections; and functions in a 
way that is analogous to the tree view o1 folders in Win- 
dows Explorer. The user can create a new collection by 
pressing the right mouse button in the Collection Man- 
ager, then selecting the "New" item on the Collection 
Manager menu 64 shown in Fig. 6. 
[0184] The Collection Manager 60 can also populate 
collections semi-automatically. One mechanism is by 
searching experimental data in the Results Manager 20 
on a specified term or phrase. Using a dialogue box, the 
user enters a biological term of interest, for example, 
"kinase," and a collection will be built consisting of items 
in the Results Manager 20 whose names have a match 
for that term. Likewise, new collections can be formed 
by text mining of scientific literature, forexample by look- 
ing for biological entities whose names co-occur fre- 
quently in journal articles. Commonly owned, co-pend- 
ing Application (Application No. not yet assigned; Attor- 
ney's Docket No, 10020151-1) filed concurrently here- 
with and titled "System, Tools and Methods to Facilitate 
Identification and Organization of New Information 
Based on Context of User's Existing Information' pro- 
vides tools for relevance ranking and filtering text that 
may be useful with the present invention, and is hereby 
incorporated, in its entirety, by reference thereto. 
[0165] Collections are very malleable. Collections 
may be split or merged, items or groups of items may 
be added, deleted, or moved from one collection to an- 
other. Collections may be nested : i.e. t a collection can 
contain other collections as well as items. Collections 
can be overlaid with detailed experimental data, for ex- 
ample by overlaying a set of expression levels on a col- 
lection of genes and highlighting the names of those 
genes whose expression levels exceed a certain thresh- 
old. Commonly owned, co-pending Application (Appli- 
cation No. not yet assigned; Attorney's Docket No. 
100201 67-1) filed concurrently herewith and titled "Sys- 
tem and Methods for Extracting Pre-Existing Data From 
Multiple Formats and Representing Data in a Common 
Format for Making Overlays" provides tools and meth- 
ods for performing overlays which may be useful with 
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the present invention, and is hereby incorporated, in Its 
entirety, by reference thereto. 

[0186] As with items, collections can serve as repos- 
itories tor links to detailed experimental data and public 
data, such as literature references. The advantage here s 
over simply adding all the links to each of the members 
of the collection is that tlie link or annotation may be 
more relevant to the "concept* embodied by the collec- 
tion, for example a link to information about the kinase 
family of proteins. The user moves Web-based inf orma- 1 o 
tion about a collection by dragging and dropping (or cut- 
ting and pasting) text and URLs from a Web page (e.g. 
an NCBI Genbank entry) onto the appropriate collection 
in the Collection Manager 60. 

15 

BIOLOGICAL STORIES 

[01 87] Concurrently or consecutively with data import 
and annotation, the user can begin, with colleagues, to 
piece together higher-level explanations of bioiogical 
processes by constructing biological stories, utilizing 
narrative structure to represent the state of the user's 
hypotheses and understandings. Narrative structure 
provides a framework for organizing information about 
the interrelationships and biological interactions 25 
amongst items and collections in biological processes. 
Biological stories can be used, for example, as tem- 
plates for organizing and describing what is going on in 
the cell, A bioJogrcai story can also be thought of as the 
representation of a hypothesis and the train of thought so 
that produced that hypothesis. 
[01 68] The user can piece together knowledge about 
a biological phenomenon and compose a biological sto- 
ry by using the Story Editor 40 component shown in 
Figs. 1 and 8. The Story Editor 40 is a syntax-directed & 
tree editor, the syntax utilizing a story grammar, derived 
from cognitive psychology research and literary theory. 
The current invention provides a default story grammar; 
however, the grammar is user-configurable and the user 
(s) can substitute terms that are more intuitive or mean- 40 
ingful to them than those in the default story grammar. 
The default story grammar in the current invention is 
shown in Fig, 9, 

[01 89] A biological story includes three main sections: 
a Theme 42, a list of one or more Players 44, and a set *s 
of Explanations 46. The Theme 42 rs a brief description 
of the overall gist of a bioiogical story, such as might 
appear in the abstract of a journal article. The Players 
44 comprise biological entities that play a role in the bi- 
ological process being described in the story, for exam- so 
pie genes and proteins, or collections of genes and/or 
proteins. Explanations 46 describe the "plot" of the sto- 
ry; they are essentially a set of evolving hypotheses 
about what processes may be occurring in a living cell, 
which are implied by the experimental data associated 55 
with the story. 

101 90] An Explanation 46 can include one or more I n- 
teractions 48 , basically steps in the process that is being 



described; for example, "PAX3-FKHR induces MY14". 
Different hypotheses can be represented by Alterna- 
tives 49, which specify different sets of possible Inter- 
actions 48. This is often useful in formative stages of an 
investigation, where there may be several plausible ex- 
planations for a particular biological phenomenon. 
[0191] The user can document the reasoning behind 
Theme 42, Explanation 46, Interaction 46, and/or Alter- 
native 49 story "elements", also referred to in this doc- 
ument as story "nodes", via Support and Oppose story 
elements. For example, the biologist can use a Support 
node to provide a citation from the literature that pro- 
vides supportive evidence for the claims made in the Al- 
ternative 49. Likewise, the biologist can use an Oppose 
story node to provide a citation from the literature that 
provides evidence that disputes a claim. 
[0192] The Story Editor 40 is a syntax-directed editor 
in which a biological story is represented by a tree struc- 
ture. In this way, it is like an "outline processor". The tree 
appears on a canvas 41 on the right side of the Story 
Editor 40. Descriptions of bioiogical phenomena are 
added to this tree, with nodes that correspond to the el- 
ements of narrative structure, i.e. Players 44, Explana- 
tions 46, eta On the left side of the Story Edftor is a set 
of buttons 400, which are used for adding nodes to (or 
deleting nodes from) the tree. Story nodes can be added 
to and deleted from the tree and textual descriptions can 
be added to story nodes i n the tree. Textual descriptions 
can be added to any node by either editing the node's 
label in place or by invoking an Object Editor 60 inter- 
face, described in detail in a later section. Each story 
node represents an element of narrative structure: for 
example, a Player 44, Explanation 46 or Interaction 46, 
[0193] A story node can be added by pressing a but- 
ton in the Story Editor 40, for example pressing the Play- 
er button 404 to add a Player. For any story node in the 
story, there is a valid set of story nodes that can be nest- 
ed below it. For example, it is valid to add a Player 44 
to the Players node, but not to the Theme node. Wrien 
a story node is added, the buttons representing the valid 
story nodes that can be nested below it are enabled, 
whereas the non-valid story nodes are disabled (grayed 
out). 

[0194] The usertypicaily starts building up a biological 
story by specifying the Players 44 in the story. Alterna- 
tively, an existing story may be imported into the present 
system and displayed in the Story Edrtor40. The Players 
44 in a biological story can be either items or collections. 
Players 44 may be added to a story by dragging and 
dropping (or cutting/copying and pasting) them from the 
Results Manager 20 and/or the Collection Manager 30, 
for example, when a story is being built or modified. 
Players 44 can also be added by pressing the Player 
button 404 and then adding descriptive text to the added 
element, as described above. 

[0195] In its simplest form, the "plot" of a biological 
story represents a sequence or set of Explanations 46, 
which in turn contain a sequence or set of Interactions 
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48. The user creates Explanations 46 by selecting the 
Explanation button 406 in the Story Editor 40, which 
causes an Explanation node to be added to the biolog- 
ical story. The, user then enters a textual description of 
the biological Explanation 46 byeitheredrting the node's 
label in place or by invoking an Object Ediior 60 interface 
that provides for detailed annotation of any node, 
[0196] The user creates Interactions 46 by selecting 
the Interaction button 408 in the Story Editor 40, which 
causes an I nteraction node to be added to the biological 
story. The user then enters a textual description of the 
biological Interaction 46 by either editing the node's la- 
bel in place or by invoking an Object Editor GO interface 
that provides for detailed annotation of any node. 
[0197] In a situation where there may be more than 
one possible explanation for a sequence of events, al- 
ternative hypotheses for what is going on may be gen- 
erated and tracked. This is often the case in the early 
phases of investigation, where there often are several 
possible explanations for a phenomenon. The user can 
add and keep track of all of the alternative hypotheses, 
and evolve them as the understanding of events be- 
comes refined. To represent an alternative hypothesis, 
an Alternative node is added to the Explanations 46 of 
the biological story, or to a specific Explanation 46 or 
Interaction 48, by selecting the Alternative button 409. 
Then an alternative sequence of Explanations and/or In- 
teractions can be added to that Alternative. 
[0198] Since the user typically will have assumptions 
or evidence underlying different hypotheses, it is useful 
to keep track of these assumptions and evidence. The 
user can add a Support node to a Theme 42, Explana- 
tion 46, Player 44, Alternative 49 7 or Interaction 46 by 
selecting 1he Support button 410, and in pulling that in- 
formation under the appropriate node. Similarly, infor- 
mation that contradicts a hypothesis may be tracked. 
This is done by adding an Oppose node in the same 
manner as described above with regard to a Support 
node, except that the Oppose button 41 2 is selected to 
accomplish this task. Textual information may be added 
to the Support and/or Oppose node by either editing the 
node's label in place or by invoking an Object Editor 60 
interface that provides for detailed annotation of any 
node. Database and literature citations may be added 
to the Support and/or Oppose nodes by dragging arid 
dropping a URL from a Web page onto a Support or Op- 
pose node, or onto the Object Editor 60 interface for that 
node. 

PUTTING THE STORY TOGETHER GRAPHICALLY 

[0199] Using the Story Editor component 40, the user 
can build up a structured textual representation of a bi- 
ological story. However, many people think graphically 
about stories and often use sketches and diagrams to 
represent their thinking about an explanation they are 
piecing together. This invention provides a Diagram Ed- 
itor component 50, shown in Figs. 1 and 11 , which may 



be used to put together a biological story pictorially. An 
analogy can be drawn here to Computer- Aided Circuit 
Design (CAD) software, particularly to CAD schematic 
capture tools, in that the biologist uses the Diagram Ed- 
5 Itor 50 to sketch out a representation of the "circuitry" of 
a biological process, such as might be found in a signal 
transduction pathway 

[0200] The Diagram Editor 50 is general and extensi- 
ble and can be used to represent a variety of biological 

10 processes that can be expressed in diagrammatic form t 
for example biochemlcaf pathways and/or protein/pro- 
tein interaction maps. Likewise, the Diagram Editor 50 
can be generalized to represent diagrams in other do* 
mains, such as telecommunications network diagrams. 

15 [0201] The Diagram Editor component 50 includes a 
canvas 52 on the right and a set of buttons 54 on the 
left for adding elements. In the Diagram Editor compo- 
nent 50, the user can put together diagrams represent- 
ing relationships between biological entities, These bi- 

20 ological entities can correspond to items in the Results 
Manager 20, collections in the Collection Manager 30, 
Players 44 in the Story Editor 40, or any arbitrary infer* 
mation added to the Diagram Editor 50 by the user (or 
added programmatically). These biological entities and 

25 their relationships can be thought of as the "nouns" and 
'Verbs" of the biological story. In the present invention, 
the "nouns" are represented by the biological entities 
and the "verbs" are represented by the interactions be- 
tween them. In the Diagram Editor 50, the "nouns" are 

30 implemented as Diagram Nodes 56 and the "verbs" are 
implemented as Diagram Interactions 58. 
[0202] The pictorial story can be built up by dragging 
and dropping Items, collections, and/or Players 44 onto 
the Diagram Editor panel (canvas 52), or by adding an 

35 arbitrary diagram node 56 (either manually via a con- 
text-sensitive menu or programmatically via data/text 
mining software). When dragging and dropping onto the 
canvas, a graphical icon, representing the biological en- 
tity, appears at the drop point There is a set of prede- 

40 fined Verbs" which are used to specify a relationship 
between "nouns", for example Inhibits, Promotes, or 
BindsTo. Commonly owned, co-pending Application 
(Application No. not yet assigned; Attorney's Docket No. 
1 00201 50-1 ) filed concurrently herewith and titled "Sys- 

45 tern and Methods for Extracting Semantics from Imag- 
es" provides tools and methods for extracting semantics 
from a static graphic image of a biological model and for 
converting the static image to an editable biological 
model which may be useful with the present invention, 

so and is hereby incorporated, in Its entirety, by reference 
thereto 

[0203] Two "nouns* are connected with a "verb" by se- 
lecting the "verb" on the menu (e.g . by pressing a button 
labeled Promotes 542), then drawing a line between the 
5? two graphical icons representing the "nouns. ,, Drawing 
is accomplished by positioning the mouse sprite over 
the first icon, pressing down on the mouse button, drag- 
ging the mouse sprite over to the second icon, then re- 
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leasing the mouse button. A color-encoded arrow ap- 
pears, connecting the two graphic icons, for example a 
red line represents the Promotes "verb." "Verbs" in the 
Diagram Editor 50 are directional; that is, a red arrow 
running from item A to (tern B indicates that "A Inhibits 5 
. but not the converse. 

[0204] There is a duality between graphical and tex- 
tual storytelling, A textual story may be generated from 
the contents of the Diagram Editor component 50 , 1 n an 
analogous manner, diagram nodes 56 and diagram in- 10 
te recti ons 58 can be generated by parsing no u reverb 
phrases in the text of the story. 

SEMANTIC OVERLAYS 

15 

[0205] Often the user needs to do a "reality check" on 
a high-level story or explanation by comparing it with 
detailed experimental data. This is done to see if the 
experimental data is consistent with the claims made in 
the story. In other words, the "top-down" synthesis of the 20 
textual anchor graphical stories needs to be reconciled 
with the "bottom-up" exploration of the experimental da- 
ta. One way of reconciling the synthesis with the data is 
to overlay items, collections, and biological stories with 
detailed experimental data. For example a set of expres- & 
sion levels may be overlaid on the Players 44 in a bio- 
logical story and those genes whose expression levels 
exceed a certain threshold can be highlighted. In this 
way, the present Invention provides a method for infor- 
mally testing the hypotheses represented in biological 30 
stories. Such overlays are semantic, rather than literal, 
in that the meanings of the data, rather than their visual 
representations, are Juxtaposed. 
[0206] The presenl invention provides a method for 
constructing semantic overlays in the Diagram Editor 35 
component 50. If the items in the Results Manager 20 
contain sets of quantitative values, for example expres- 
sion levels from microarray experiments, then the biol- 
ogist can "step through" each column of data and visu- 
alize the data values, such as expression levels, color- 40 
coded on top of the icons for those items in the Diagram 
Editor 50, Such "simulations" can be useful, for exam- 
ple, in inferring relationships between items, such as 
causal relationships inferred by "stepping through" time 
course data. 45 
[0207] For example, in Fig. 1 , many of the columns in 
the Results Manager 20 represent values from thou- 
sands of probes in DNA microarray experiments, where, 
for example, test samples may be compared with refer- 
ences samples (e.g., diseased tissue versus "norma!" so 
tissue) under various conditions. Cells (row/column in- 
tersections) in the Results Manager 20 that are colored 
reddish indicate an up-regulation of the gene, those that 
are colored greenish indicate a down-regulation of the 
gene, and a black color represents neutral, i.e., substan- 
tially no up or down regulation. Various shades and in- 
tensities of green and red result, which indicate the rel- 
ative degree of up or down regulation of any particular 



probe, In the example, there were approximately 6000 
rows in the matrix, although only a few have been shown 
in Fig. 1 for reasons of simplicity. Each coiumn repre- 
sents a different microarray experiment. This kind of 
color-encoding of expression values is often referred to 
as a "heat map". 

[0208] In use, any column can be selected to overlay 
the values of that column onto the diagram In the Dia- 
gram Editor 50 and/or the Players 44 in the Story Edttor 
40. In the example shown in Fig. 1 1 when a column is 
selected, any genes having values in that column are 
matched up with their representations in the Diagram 
Editor 50 and the Story Editor 40. A visual representa- 
tion of this overlay is displayed, wherein the overlaid da- 
ta shows up in its representative color on each of the 
nodes in the Diagram Editor 50 as well as in the Story 
Editor 40. This holds true for each node in the pathway 
diagram that references an item in the experimental da- 
ta, as well as each Player node in the Story Editor 40 
that references an item in the experimental data. 
[0209] A range of colors is mapped to a range of val- 
ues in the data. Items that have similar values will have 
sfmilar color schemes whereas Items that are disparate 
will have different color schemes. The user can repeat 
this process, a column at a time from the values in the 
Results Manager 20, thereby stepping through al I of the 
data resultant from the microarray experiments and an- 
alyzing each column in the same manner to verify cor- 
relating data and annotate discrepancies and outliers, 
by visualizing the expression levels, color-coded on top 
of the nodes for those Hems in the Diagram Editor 50 
and/or Story Editor 40. 

£021 0] In addition to DNA microarray data, the present 
invention is capable of performing overlays of data from 
other drverse data sources, such as mass spectrometry 
or gel electrophoresis data. Moreover, this functionality 
can be generalized to other domains, for example in 
over I ay i ng measurement data from telecomm un ications 
network probes onto network diagrams, 

ANNOTATION AND CITATIONS 

[0511] To support users' keeping track of diverse piec- 
es of information and to support team communication 
about the evolving information, this invention imple- 
ments a rich annotation and citation facility. Every item, 
coliection, story node, and diagram node or interaction 
can have arbitrary textual notes attached to it. 
[0212] The present invention provides an Object Ed- 
itor interface 60 for editing and annotating the properties 
and contents of biological entities or other items and col- 
lections. The Object Editor tool 60 is a form-based edi- 
tor. By typing into fields in these forms, the user can add 
arbitrary annotationsto the item or collection, as well as 
add annotations for each link to detailed information. For 
example, the user may want to add, as an annotation, 
a note thai summarizes his/her current understanding 
of thefunction of a particularbiological entity. The Object 
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Editor 60 can be invoked by double-clicking on any bi- 
ological object represented in the system, Fig. 2 shows 
the Object Editor 60 for an item. 
[0213] Any and every item, collection, story node, and 
diagram node or Interaction can have an arbitrary list of s 
citations attached Lo IL The user can add citations by 
dragging/dropping URLs from a Web browser onto any 
object in the system or into the Citations field 62 of the 
Object Editor 60. Each citation can in turn have arbitrary 
textual notes attached to it. The user can add a note 10 
describing his or her reasoning or other context around 
their using a particular citation, 

SUPPORT FOR GROUP WORK 

15 

[021 4] White the i nvention will be useful for an individ- 
ual user in keeping track of information while building 
up explanations and hypotheses, some of its real power 
derives from the ability of the user to share biological 
stories with colleagues and collaborators. This is a way 20 
forthe user to share the state of his/her thinking, receive 
feedback from colleagues, incorporate that feedback in- 
to the state of thinking, and, thus, refine the state of his/ 
her thinking. 

[0215] The present invention includes a number of fa- £* 
cilities that support group work. Every annotation and 
citation is tagged with the name of the user who enters 
that annotation; it is also time-stamped. When the user 
adds an annotation to a citation, the annotation commu- 
nicates to the group his or her reasoning behind using so 
that citation. As described earlier, the support and op- 
pose nodes in the Story Editor 40 enable users to record 
their lines of argumentation as alternative hypotheses 
are explored. It is very helphjl lo be able to articulate the 
lines of thought, and evidence related to those lines of as 
thought, when working in groups. 
[021 6] The present inventio n further provides a repos- 
ttory of generated Web pages, described below, to sup- 
port the sharing of biological stories and their supporting 
information. 40 

WEB REPOSITORY 

[0217) The present invention uses generated Web 
pages to represent the detailed information contained 4 $ 
in its elements. The software generates an interlinked 
set of HTML pages, where each item, each collection, 
and each element of a story has its own Web page, A 
Web page for an item is shown in Fig. 10. When new 
information is associated with a data object, for example so 
by dragging and dropping (or copying and pasting) a lit- 
erature citation onto an item, that new information is in- 
corporated into the Web page for that item. The user 
can navigate through this biological information space 
by selecting and following the links on the Web pages ss 
for items, collections, and stories. In addition to a spe- 
cif ic Web page for each data object, there are index Web 
pages, one forthe set of all items, one forthe set of all 



collections, and one for the set of all story elements, The 
index page for the set of all story elements is shown in 
Fig, 7, A Web repository for a model can be created by 
selecting the "Publish To Web H menu item on the Tools 
menu, shown in Fig. 12, 

[0218] To support the sharing of biological stories 
amongst groups of collaborating colleagues, the 
present invention generates a Web page for every node 
that appears in the Story Editor 40. Thus, every biolog- 
ical story can have its own Web page. The Players 44 
displayed on the Web page for the biological story con- 
tain links to the Web pages forthe items and collections 
represented by the Players 44 in the biological story. For 
example, the Web page in Fig. 10 points to the actual 
item for "pdgfra", not to the Player that references it. A 
player is actually a reference to an item, not the item 
itsetf . This distinction Is important because the user can 
annotate a Player and item separately, which allows the 
use of an notations of the Player as a way to denote con- 
textual information as it relates to the item's role in a 
particular story. That is, the same item could be a player 
in multiple stories (or even in multiple places, such as 
alternatives, in the same story). Therefore, having a dis- 
tinct Player element allows the user to annotate specific 
information about the item's role in the story, distinct 
from direct annotations on the item itsetf. Thus, a col- 
laborator that visits the Web page for a biological story 
can navigate throughout the entire context surrounding 
that biological story. The Web page is a richly intercon- 
nected map of the users train of thinking in building up 
a particular set of exp tan atlon sand/or hypotheses. Note 
that the collaborator does not specifically need to be us- 
ing the software described in this invention in order to 
navigate through the Web repository for a story. Any 
Web browser will suffice for this purpose. 
[0219] if a colleague is using the program described 
in this invention, rather than a Web browser, for navigat- 
ing a biological story, then this colleague can serve as 
a "reviewer" and add annotations. This can done using 
the mechanisms for annotation described earlier. The 
software tags such annotations with the "reviewers" 
name and also a time stamp, so that annotations from 
different colleagues can be distinguished and chrono- 
logically ordered. 

SAVING WORK IN PROGRESS 

[0220] In the present invention, a model is the central 
data structu re of the software system and it encompass- 
es all the information including experimental data, an- 
notations, categorization, and textual and graphical ex- 
planations of biological processes. Thus, a model em- 
bodies the current state of work-in -progress of the user. 
This state of work can be saved by invoking the "Save 
Model As" operation 16 on the File menu 1 0 shown in 
Fig, 3. All Items, collections, and stories (both textual 
and graphical) are written to persistent storage, such as 
a file, using XML Web technology described at [http:// 
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w3.org]. All the links to detailed information associated 
with the items, collections, and stories are ssved along 
with them. Other contextual information, such as the co- 
ordinates of nodes placed in the Diagram Editor 50 com- 
ponent, are also saved. All this information is restored s 
the next time the program is run, 
[0221] When saving a model . ir there is not current!/ 
a persistent store (e.g. a file) forthe model, then the user 
is prompted for a name for the model via a "file chooser" 
dialog. This is trie case when the Save Model As oper- 10 
alion 1 6 is invoked; the user will be prompted for a name 
for the model. In the case where the operation Save 
Model 1 7 has been invoked and there already exists a 
persistent store (e.g. a file) for that model, then the sys- 
tem will just overwrite the persistent store with the con- 15 
tents of the current modeL 

[0222] For safety purposes, the software will also 
prompt to save the current model upon exiting the pro- 
gram. Invoking the Quit item 1 8 on the File menu shown 
in Fig. 3 also causes the software to display a dialog 20 
box, asking to save changes. 

[0223] The user can also load in an existing model 
from a persistent store {e.g. a file) by invoking the Load 
Model 1 9 operation on the File menu 1 0 shown in Fig, 
3. Prior to loading in the model, the user will be prompted £5 
about whether to save changes made to the currently 
loaded model before loading in a model from persistent 
store. After that, the system will present a "file chooser" 
dialog , from which the user can choose an existing mod- 
el to load, 30 
[0224] While the present invention has been de- 
scribed with reference to the specific embodiments 
thereof, it should be understood by those skilled in the 
art thai various changes may be made aid equivalents 
may be substituted without departing from the true spirit ss 
and scope of the invention. In addition, many modifica- 
tions may be made to adapt a particular situation, data 
type, network, user need, process, process step or 
steps, to the objective, spirit and scope of the present 
invention. All such modifications are intended to be with- *o 
in the scope of the claims appended hereto. 



Claims 

45 

1. A system for organizing information across external 
information objects, comprising: 



about interactions between items, collections 
or items and collections; and 

a diagram editor (50) for incorporating items, 
collections or Items and collections, into a 
graphical representation of a story; 

wherein an update of information contained in any 
one of the components comprising the results man- 
ager, collection manager, story editor, and diagram 
editor is made in the remainder of the components. 

2. Trie system of claim 1 , wherein the update of infor- 
mation Is automatically made in the remainder of 
the components. 

3. The system of claim 1 or 2, further comprising 

a controller means; and 

at least one listener means associated with one 
of the results manager, the collection manager, 
the story editor and the diagram editor, 

wherein the controller means detects chang- 
es in the information in one of the components and 
generates a message in response to the change of 
information, the message indicating which informa- 
tion was changed, and 

wherein the listener means receives the mes- 
sage from the controller and causes the component 
with which the listener means is associated to up- 
date a specific information in the component on the 
basis of the information contained in the message. 

4. The system of claim 3, wherein all components 
have associated a listener means. 

5. The system of one of claims claim 1 to 3, further 
comprising: 

means for overlaying information from one or 
more of the results manager (20), collection 
manager (30), story editor, and diagram editor 
(50) on one or more viewers of the results man- 
ager, collection manager, story editor, and dia- 
gram editor. 



a results manager (20) for importing and view- 
ing detailed experimental results as one type of so 
representation of external information objects; 

a collection manager (30) for creating and ma- 
nipulating collections of items representing ex- 
ternal information objects; ss 

a story editor (40) for providing a narrative 
structure for textually organizing information 



6. The system of claim 5 t wherein in the results man- 
ager the experimental results are color encoded, 
and wherein the means for overlaying further com- 
prises means for selecting a color encoding for an 
associated experimental result, and means for 
mapping the selected color encoding to elements in 
the story editor and/or in the diagram editor associ- 
ated with the specific experimental result which 
color encoding was selected. 
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7. The system of one of claims 1 to 6, wherein the di- 
agram editor (50) is adapted to automatically put to- 
gether a diagram (52) representing a relationship 
between elements in the story editor (40). 

5 

6. The system or claim 7, wherein the diagram editor 
(50) comprises means Tor searching the s Lory editor 
for nouns and verbs, wherein nouns represent spe- 
cific elements in the story editor (40), and verbs rep- 
resent specific interactions between the elements to 
in the story editor (40), means for implementing the 
nouns as diagram nodes (56), and means for im- 
plementing verbs as diagram interactions (58), 

9. The system of one of claims 1 to 8, further compris- 15 
ing means for importing the external information ob- 
jects into the results manager (20), the external in- 
formation objects being provided in first format the 
external information objects being arranged in the 
results manager (20) in a second format, wherein 20 
said means for importing comprises meansfor con- 
verting the external information objects from thefirst 
format into the second format. 

10. The system of claim 9 ? wherein the first format is a 25 
spreadsheet with tab-separated columns, and 
wherein the external information objects comprise 

an additional header, the header being in the form 

of: 

30 

unigene-id<tab>gene-narne<tabxfor- 

matxco] >-<n arne> <tab> 

where 

<format> is one of : 35 

double -specifies that this column repre- 
sents a Double value, 
int --specifies that this column represents 
an Integer value, *o 
text -specifies that this colum n represents 
a text value, 

data — specifies that this column repre- 
sents a Double value, 

45 

<cof> specifies the column where this data 
should be initially presented in the results man- 
ager, 

<name? specifies the actual name of the col- 50 
umn, 

'unigene-id 1 is the headerfor the field that spec- 
ifies the identifier in the Unigene database for 
the item. 55 

11. The system of one of claims 1 to 10, wherein the 
collection manager comprises meansfor searching 
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the results manager (20) for specific terms, and 
means for building the collection on the basis of 
matched terms In the results manager. 

12- The system of claim 11 , wherein the collection man- 
ager further comprises a dialog box Tor receiving the 
specific terms. 

13. The system of one of claims 1 to 12, further com- 
prising an object editor for adding as well as editing 
annotations to items, collections, stories, interac- 
tions, and graphical representations of stories. 

14. The system of claim 13, further comprising means 
for tagging each said annotation with the user name 
who created it and with a time stamp indicating the 
time of creation of the annotation, respectively. 

15- The system of one of claims 1 to 14, further com- 
prising means for generating a web repository, the 
web repository including a web page for each item, 

16. The system of one of claims 1 to 15, wherein the 
wherein the results manager, collection manager, 
story editor, and diagram editor support the display 
of the information, 

17. A method of verifying and validating experimental 
data, the method comprising: 

importing the experimental data into a results 
manager (20); 

overlaying items selected from the results man- 
ager onto a textual story provided In a story ed- 
itor (40) or onto a graphical story provided in a 
diagram editor (50); 

comparing the overlaid items with the informa- 
tion in the textual story or the graphical story; 
and 

indicating the result of the comparison in the 
story editor or in the diagram editor. 

18. The method of claim 17, wherein the overlaying 
comprises selecting an item in the results manager 
or at least one node or interaction in the graphical 
story- 
IS. The method of claim 17 or 18, wherein in the results 

manager the experimental results are color encod- 
ed, and the overlaying f urther comprises selecting 
a color encoding for an associated experimental re- 
sult, and mapping the selected color encoding to el- 
ements in the story editor andfor in the diagram ed- 
itor associated with the specific experimental result 
which color encoding was selected. 
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